An Empirical Comparison of the Summarization Power of Graph Clustering Methods

نویسندگان

  • Yike Liu
  • Neil Shah
  • Danai Koutra
چکیده

How do graph clustering techniques compare with respect to their summarization power? How well can they summarize a million-node graph with a few representative structures? Graph clustering or community detection algorithms can summarize a graph in terms of coherent and tightly connected clusters. In this paper, we compare and contrast different techniques: METIS, Louvain, spectral clustering, SlashBurn and KCBC, our proposed k-core-based clustering method. Unlike prior work that focuses on various measures of cluster quality, we use vocabulary structures that often appear in real graphs and the Minimum Description Length (MDL) principle to obtain a graph summary per clustering method. Our main contributions are: (i) Formulation: We propose a summarization-based evaluation of clustering methods. Our method, VOG-OVERLAP, concisely summarizes graphs in terms of their important structures which lead to small edge overlap, and large node/edge coverage; (ii) Algorithm: we introduce KCBC, a graph decomposition technique, in the heart of which lies the k-core algorithm (iii) Evaluation: We compare the summarization power of five clustering techniques on large real graphs, and analyze their compression performance, summary statistics and runtimes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Hybrid Summarization

One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...

متن کامل

An Empirical Comparison of Distance Measures for Multivariate Time Series Clustering

Multivariate time series (MTS) data are ubiquitous in science and daily life, and how to measure their similarity is a core part of MTS analyzing process. Many of the research efforts in this context have focused on proposing novel similarity measures for the underlying data. However, with the countless techniques to estimate similarity between MTS, this field suffers from a lack of comparative...

متن کامل

Detecting Overlapping Communities in Social Networks using Deep Learning

In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...

متن کامل

Application of Pattern Recognition Algorithms for Clustering Power System to Voltage Control Areas and Comparison of Their Results

Finding the collapse susceptible portion of a power system is one of the purposes of voltage stability analysis. This part which is a voltage control area is called the voltage weak area. Determining the weak area and adjecent voltage control areas has special importance in the improvement of voltage stability. Designing an on-line corrective control requires the voltage weak area to be determi...

متن کامل

Application of Pattern Recognition Algorithms for Clustering Power System to Voltage Control Areas and Comparison of Their Results

Finding the collapse susceptible portion of a power system is one of the purposes of voltage stability analysis. This part which is a voltage control area is called the voltage weak area. Determining the weak area and adjecent voltage control areas has special importance in the improvement of voltage stability. Designing an on-line corrective control requires the voltage weak area to be determi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1511.06820  شماره 

صفحات  -

تاریخ انتشار 2015